Goto

Collaborating Authors

 Passaic


Translating Embeddings for Modeling Multi-relational Data

Neural Information Processing Systems

We consider the problem of embedding entities and relationships of multirelational data in low-dimensional vector spaces. Our objective is to propose a canonical model which is easy to train, contains a reduced number of parameters and can scale up to very large databases. Hence, we propose TransE, a method which models relationships by interpreting them as translations operating on the low-dimensional embeddings of the entities. Despite its simplicity, this assumption proves to be powerful since extensive experiments show that TransE significantly outperforms state-of-the-art methods in link prediction on two knowledge bases. Besides, it can be successfully trained on a large scale data set with 1M entities, 25k relationships and more than 17M training samples.


Clustering US Counties to Find Patterns Related to the COVID-19 Pandemic

Brown, Cora, Milstein, Sarah, Sun, Tianyi, Zhao, Cooper

arXiv.org Artificial Intelligence

When COVID-19 first started spreading and quarantine was implemented, the Society for Industrial and Applied Mathematics (SIAM) Student Chapter at the University of Minnesota-Twin Cities began a collaboration with Ecolab to use our skills as data scientists and mathematicians to extract useful insights from relevant data relating to the pandemic. This collaboration consisted of multiple groups working on different projects. In this write-up we focus on using clustering techniques to help us find groups of similar counties in the US and use that to help us understand the pandemic. Our team for this project consisted of University of Minnesota students Cora Brown, Sarah Milstein, Tianyi Sun, and Cooper Zhao, with help from Ecolab Data Scientist Jimmy Broomfield and University of Minnesota student Skye Ke. In the sections below we describe all of the work done for this project. In Section 2, we list the data we gathered, as well as the feature engineering we performed. In Section 3, we describe the metrics we used for evaluating our models. In Section 4, we explain the methods we used for interpreting the results of our various clustering approaches. In Section 5, we describe the different clustering methods we implemented. In Section 6, we present the results of our clustering techniques and provide relevant interpretation. Finally, in Section 7, we provide some concluding remarks comparing the different clustering methods.


4 questions with Rush CIO Dr. Shafiq Rab

#artificialintelligence

Dr. Shafiq Rab, CIO of Rush University Medical Center in Chicago, uses his background in public health to inform his IT vision. Dr. Rab, who completed his medical degree and internal medicine residency at Karachi, Pakistan-based Dow Medical College, had his interest in public health piqued during one of his first physician jobs. While treating an urban squatters settlement in Pakistan, he worked with non-governmental organizations to address the infant mortality rate, mainly by bringing clean drinking water to its residents. "That's how I got involved in healthcare," he says. "And I remain committed to healthcare.


Translating Embeddings for Modeling Multi-relational Data

Bordes, Antoine, Usunier, Nicolas, Garcia-Duran, Alberto, Weston, Jason, Yakhnenko, Oksana

Neural Information Processing Systems

We consider the problem of embedding entities and relationships of multi-relational data in low-dimensional vector spaces. Our objective is to propose a canonical model which is easy to train, contains a reduced number of parameters and can scale up to very large databases. Hence, we propose, TransE, a method which models relationships by interpreting them as translations operating on the low-dimensional embeddings of the entities. Despite its simplicity, this assumption proves to be powerful since extensive experiments show that TransE significantly outperforms state-of-the-art methods in link prediction on two knowledge bases. Besides, it can be successfully trained on a large scale data set with 1M entities, 25k relationships and more than 17M training samples.